Final Project Part-2

Final Project Part-2
Final Project Part-2
Author

Niharika Pola

Published

December 1, 2022

Code
library(tidyverse)
library(ggplot2)
library(readr)
library(corrplot)
library(reshape2)
library(corrgram)
library(ggpubr)
library(caTools)
library(GGally)

Background/Motivation

What is takes for a country or a continent to be happy? Is it the economy, life-expectancy, freedom or the trust in the government? What are the factors that affect a country’s or continents overall happiness? Can we predict the happiness score? The curiosity to find answers to these questions made me explore the world happiness data of 2022.

“This year marks the 10th anniversary of the World Happiness Report, which uses global survey data to report how people evaluate their own lives in more than 150 countries worldwide. The World Happiness Report 2022 reveals a bright light in dark times. The pandemic brought not only pain and suffering but also an increase in social support and benevolence. As we battle the ills of disease and war, it is essential to remember the universal desire for happiness and the capacity of individuals to rally to each other’s support in times of great need.” - World Happiness Report 2022

Research Question

The World happiness data tries to measure the happiness of the populace of every country and comes up with a score which connotes the level of happiness of the populace.

The data set uses various variables to measure happiness such as the GDP per capita, Freedom to make choices, life expectancy, the perception of corruption, generosity and social support.

In this study, I aim to find out answers to the following research questions:

  1. What are the variables or factors that are affecting world’s happiness, with a focus on individual countries & continents. This includes analyzing the correlation between most effective variables.
  2. To find out which model accurately predicts the happiness score.

Hypothesis

I wish to test the following hypothesis,

  1. Better economy of a country would lead to happiness
  2. Longer life expectancy would lead to happiness
  3. Having family/social support leads to happiness
  4. Freedom leads to happiness
  5. People’s trust in the Government leads to happiness
  6. Generosity leads to happiness

Data Preparation

Reading the data set

primary <- read.csv("project datasets/World Happiness Report 2022.csv")
head(primary)
str(primary)
str(primary$Country)

The dataset that I have chosen is happiness 2022 dataset, one of Kaggle’s dataset. This dataset gives the happiness rank and happiness score of 147 countries around the world based on 8 factors including GDP per capita, Social support, Health life expectancy, freedom to make life choices, Generosity, Perceptions of corruption and dystopia residual. The higher value of each of these 8 factors means the level of happiness is higher. Dystopia is the opposite of utopia and has the lowest happiness level. Dystopia will be considered as a reference for other countries to show how far they are from being the poorest country regarding happiness level.

Source of the data: World Happiness Report 2022 use data from the Gallup World Poll surveys from 2019 to 2021. They are based on answers to the main life evaluation question asked in the poll.

Some of the variable names are not clear enough and I decided to change the name of several of them a little bit. Also, I will remove whisker low and whisker high variables from my dataset because these variables give only the lower and upper confidence interval of happiness score and there is no need to use them for visualization and prediction.

The next step is adding another column to the dataset which is continent. I want to work on different continents to discover whether there are different trends for them regarding which factors play a significant role in gaining higher happiness score. Asia, Africa, North America, South America, Europe, and Australia are our six continents in this dataset. Then I moved the position of the continent column to the second column because I think with this position arrange, dataset looks better. Finally, I changed the type of continent variable to factor to be able to work with it easily for visualization.

Preparation of the data

Code
# Changing the name of columns
colnames (primary) <- c("Happiness.Rank", "Country",  "Happiness.Score",
                          "Whisker.High", "Whisker.Low", "Dystopia.Residual",   "Economy", "Family", "Life.Expectancy", "Freedom", "Generosity",
                          "Trust")
Error in colnames(primary) <- c("Happiness.Rank", "Country", "Happiness.Score", : object 'primary' not found
Code
# Country: Name of countries
# Happiness.Rank: Rank of the country based on the Happiness Score
# Happiness.Score: Happiness measurement on a scale of 0 to 10
# Whisker.High: Upper confidence interval of happiness score
# Whisker.Low: Lower confidence interval of happiness score
# Economy: The value of all final goods and services produced within a nation in a given year
# Family: Importance of having a family
# Life.Expectancy: Importance of health and amount of time prople expect to live
# Freedom: Importance of freedom in each country
# Generosity: The quality of being kind and generous
# Trust: Perception of corruption in a government
# Dystopia.Residual: Plays as a reference

# Deleting unnecessary columns (Whisker.high and Whisker.low)

primary <- primary[, -c(4,5)]
Error in eval(expr, envir, enclos): object 'primary' not found
Code
primary$Continent <- NA
Error in primary$Continent <- NA: object 'primary' not found
Code
primary$Continent[which(primary$Country %in% c("Israel", "United Arab Emirates", "Singapore", "Thailand", "Taiwan Province of China",
                                   "Qatar", "Saudi Arabia", "Kuwait", "Bahrain", "Malaysia", "Uzbekistan", "Japan",
                                   "South Korea", "Turkmenistan", "Kazakhstan", "Turkey", "Hong Kong S.A.R., China", "Philippines",
                                   "Jordan", "China", "Pakistan", "Indonesia", "Azerbaijan", "Lebanon", "Vietnam",
                                   "Tajikistan", "Bhutan", "Kyrgyzstan", "Nepal", "Mongolia", "Palestinian Territories",
                                   "Iran", "Bangladesh", "Myanmar", "Iraq", "Sri Lanka", "Armenia", "India", "Georgia",
                                   "Cambodia", "Afghanistan", "Yemen", "Syria"))] <- "Asia"
Error in primary$Continent[which(primary$Country %in% c("Israel", "United Arab Emirates", : object 'primary' not found
Code
primary$Continent[which(primary$Country %in% c("Norway", "Denmark", "Iceland", "Switzerland", "Finland",
                                   "Netherlands", "Sweden", "Austria", "Ireland", "Germany",
                                   "Belgium", "Luxembourg", "United Kingdom", "Czech Republic",
                                   "Malta", "France", "Spain", "Slovakia", "Poland", "Italy",
                                   "Russia", "Lithuania", "Latvia", "Moldova", "Romania",
                                   "Slovenia", "North Cyprus", "Cyprus", "Estonia", "Belarus",
                                   "Serbia", "Hungary", "Croatia", "Kosovo", "Montenegro",
                                   "Greece", "Portugal", "Bosnia and Herzegovina", "Macedonia",
                                   "Bulgaria", "Albania", "Ukraine"))] <- "Europe"
Error in primary$Continent[which(primary$Country %in% c("Norway", "Denmark", : object 'primary' not found
Code
primary$Continent[which(primary$Country %in% c("Canada", "Costa Rica", "United States", "Mexico",  
                                   "Panama","Trinidad and Tobago", "El Salvador", "Belize", "Guatemala",
                                   "Jamaica", "Nicaragua", "Dominican Republic", "Honduras",
                                   "Haiti"))] <- "North America"
Error in primary$Continent[which(primary$Country %in% c("Canada", "Costa Rica", : object 'primary' not found
Code
primary$Continent[which(primary$Country %in% c("Chile", "Brazil", "Argentina", "Uruguay",
                                   "Colombia", "Ecuador", "Bolivia", "Peru",
                                   "Paraguay", "Venezuela"))] <- "South America"
Error in primary$Continent[which(primary$Country %in% c("Chile", "Brazil", : object 'primary' not found
Code
primary$Continent[which(primary$Country %in% c("New Zealand", "Australia"))] <- "Australia"
Error in primary$Continent[which(primary$Country %in% c("New Zealand", : object 'primary' not found
Code
primary$Continent[which(is.na(primary$Continent))] <- "Africa"
Error in primary$Continent[which(is.na(primary$Continent))] <- "Africa": object 'primary' not found
Code
# Moving the continent column's position in the dataset to the second column

primary <- primary %>% select(Country,Continent, everything())
Error in select(., Country, Continent, everything()): object 'primary' not found
Code
#Renaming the final dataframe to happy

happy <- primary
Error in eval(expr, envir, enclos): object 'primary' not found
Code
str(happy)
'data.frame':   51020 obs. of  10 variables:
 $ id     : num  1 2 3 4 5 6 7 8 9 10 ...
 $ happy  : Factor w/ 3 levels "not too happy",..: 1 1 2 1 2 2 1 1 2 2 ...
 $ year   : num  1972 1972 1972 1972 1972 ...
 $ age    : num  23 70 48 27 61 26 28 27 21 30 ...
 $ sex    : Factor w/ 2 levels "male","female": 2 1 2 2 2 1 1 1 2 2 ...
 $ marital: Factor w/ 5 levels "married","never married",..: 2 1 1 1 1 2 3 2 2 1 ...
 $ degree : Factor w/ 5 levels "lt high school",..: 4 1 2 4 2 2 2 4 2 2 ...
 $ finrela: Factor w/ 5 levels "far below average",..: 3 4 3 3 4 4 4 3 3 2 ...
 $ health : Factor w/ 4 levels "poor","fair",..: 3 2 4 3 3 3 4 3 4 2 ...
 $ wtssall: num  0.445 0.889 0.889 0.889 0.889 ...

Visualization

Analyzing the correlation between each numeric variable

As we already know that the sum of these numeric variables gives the happiness score and there is an inverse relationship between happiness score and happiness rank. The higher the happiness score, the lower the happiness rank. Hence there is no need of looking at the correlation between each numeric variable and happiness rank. We can directly look at the happiness score and other numeric variables.

Code
# Create a correlation plot
ggcorr(dataset, label = TRUE, label_round = 2, label_size = 3.5, size = 2, hjust = .85) +
  ggtitle("Correlation Heatmap") +
  theme(plot.title = element_text(hjust = 0.5))
Error in ggcorr(dataset, label = TRUE, label_round = 2, label_size = 3.5, : object 'dataset' not found

According to the above cor plot, Economy, life expectancy, and family play the most significant role in contributing to happiness. Trust and generosity have the lowest impact on the happiness score.

Comparing different continents regarding their happiness variables

Let’s calculate the average happiness score and the average of the other seven variables for each continent. Then melt it to have variables and values in separate columns. Finally, using ggplot to show the difference between continents.

Code
happy.Continent <- happy %>%
                          select(-3) %>%
                          group_by(Continent) %>%
                          summarise_at(vars(-Country), funs(mean(., na.rm=TRUE)))
Error in `group_by()`:
! Must group by variables found in `.data`.
✖ Column `Continent` is not found.
Code
# Or we can use aggregate
# aggregate(happy[, 4:11], list(happy$Continent), mean)

# Melting the "happy.Continent" dataset
happy.Continent.melt <- melt(happy.Continent)
Error in melt(happy.Continent): object 'happy.Continent' not found
Code
# Faceting
ggplot(happy.Continent.melt, aes(y=value, x=Continent, color=Continent, fill=Continent)) + 
  geom_bar( stat="identity") +    
  facet_wrap(~variable) + theme_bw() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  labs(title = "Average value of happiness variables for different continents", 
       y = "Average value") 
Error in ggplot(happy.Continent.melt, aes(y = value, x = Continent, color = Continent, : object 'happy.Continent.melt' not found

We can see that Australia has approximately the highest average in all fields except dystopia residual, after that Europe, North America, and South America are roughly the same regarding happiness score and the other seven factors. Finally, Asia and Africa have the lowest scores in all fields.

Correlation plot for each continent

Let’s see the correlation between variables for each continent.

Code
ggcorr(happy %>% select(-3) %>% filter(Continent == "Africa"), label = TRUE, label_round = 2, label_size = 3.5, size = 3.5, hjust = .85) +
  ggtitle("Happiness Matrix for Africa") +
  theme(plot.title = element_text(hjust = 0.5))
Error in `filter()`:
! Problem while computing `..1 = Continent == "Africa"`.
Caused by error in `mask$eval_all_filter()`:
! object 'Continent' not found

Correlation between “Happiness Score” and the other variables in Africa:
Economy > Life Expectancy > Family > Dystopia Residual > Freedom

Happiness score and Generosity are inversely correlated.
There is no correlation between happiness score and trust.

Code
ggcorr(happy %>% select(-3) %>% filter(Continent == "Asia"), label = TRUE, label_round = 2, label_size = 3.5, size = 3.5, hjust = .85) +
  ggtitle("Happiness Matrix for Asia") +
  theme(plot.title = element_text(hjust = 0.5))
Error in `filter()`:
! Problem while computing `..1 = Continent == "Asia"`.
Caused by error in `mask$eval_all_filter()`:
! object 'Continent' not found

Correlation between “Happiness Score” and the other variables in Asia:
Family > Dystopia.Residual > Freedom > Economy > Life Expectancy > Trust

There is no correlation between happiness score and Generosity.

Code
ggcorr(happy %>% select(-3) %>% filter(Continent == "Europe"), label = TRUE, label_round = 2, label_size = 3.5, size = 3.5, hjust = .85) +
  ggtitle("Happiness Matrix for Europe") +
  theme(plot.title = element_text(hjust = 0.5))
Error in `filter()`:
! Problem while computing `..1 = Continent == "Europe"`.
Caused by error in `mask$eval_all_filter()`:
! object 'Continent' not found

Correlation between “Happiness Score” and the other variables in Europe:
Trust > Economy > Freedom > Life Expectancy > Dystopia Residual > Family > Generosity

Code
ggcorr(happy %>% select(-3) %>% filter(Continent == "North America"), label = TRUE, label_round = 2, label_size = 3.5, size = 3.5, hjust = .85) +
  ggtitle("Happiness Matrix for North America") +
  theme(plot.title = element_text(hjust = 0.5))
Error in `filter()`:
! Problem while computing `..1 = Continent == "North America"`.
Caused by error in `mask$eval_all_filter()`:
! object 'Continent' not found

Correlation between “Happiness Score” and the other variables in North America:
Economy > Life Expectancy > Family > Generosity > Trust
There is an inverse correlation between happiness score and dystopia residual.

Code
ggcorr(happy %>% select(-3) %>% filter(Continent == "South America"), label = TRUE, label_round = 2, label_size = 3.5, size = 3.5, hjust = .85) +
  ggtitle("Happiness Matrix for South America") +
  theme(plot.title = element_text(hjust = 0.5))
Error in `filter()`:
! Problem while computing `..1 = Continent == "South America"`.
Caused by error in `mask$eval_all_filter()`:
! object 'Continent' not found

Correlation between “Happiness Score” and the other variables in South America:
Dystopia.Residual > Trust > Family > Freedom > Life Expectancy
There is an inverse correlation between happiness score and dystopia residual, Generosity.

Happiness score comparison on different continents

We will use scatter plot, box plot, and violin plot to see the happiness score distribution in different countries, how this score is populated in these continents and also will calculate the mean and median of happiness score for each of these continents.

Code
####### Happiness score for each continent

gg1 <- ggplot(happy,
              aes(x=Continent,
                  y=Happiness.Score,
                  color=Continent))+
  geom_point() + theme_bw() +
  theme(axis.title = element_text(family = "Helvetica", size = (8)))

gg2 <- ggplot(happy , aes(x = Continent, y = Happiness.Score)) +
  geom_boxplot(aes(fill=Continent)) + theme_bw() +
  theme(axis.title = element_text(family = "Helvetica", size = (8)))

gg3 <- ggplot(happy,aes(x=Continent,y=Happiness.Score))+
  geom_violin(aes(fill=Continent),alpha=0.7)+ theme_bw() +
  theme(axis.title = element_text(family = "Helvetica", size = (8)))

# Compute descriptive statistics by groups
stable <- desc_statby(happy, measure.var = "Happiness.Score",
                      grps = "Continent")
Error in `group_by()`:
! Must group by variables found in `.data`.
✖ Column `Continent` is not found.
Code
stable <- stable[, c("Continent","mean","median")]
Error in eval(expr, envir, enclos): object 'stable' not found
Code
names(stable) <- c("Continent", "Mean of happiness score","Median of happiness score")
Error in names(stable) <- c("Continent", "Mean of happiness score", "Median of happiness score"): object 'stable' not found
Code
# Summary table plot
stable.p <- ggtexttable(stable,rows = NULL, 
                         theme = ttheme("classic"))
Error in as.matrix(d): object 'stable' not found
Code
ggarrange(gg1, gg2, ncol = 1, nrow = 2)
Error in `geom_point()`:
! Problem while computing aesthetics.
ℹ Error occurred in the 1st layer.
Caused by error in `FUN()`:
! object 'Continent' not found
Code
ggarrange(gg3, stable.p, ncol = 1, nrow = 2)
Error in ggarrange(gg3, stable.p, ncol = 1, nrow = 2): object 'stable.p' not found

As we have seen before, Australia has the highest median happiness score. Europe, South America, and North America are in the second place regarding median happiness score. Asia has the lowest median after Africa. We can see the range of happiness score for different continents, and also the concentration of happiness score.

Find Relationship using Scatter Plot of Happiness Score with each variable (include regression line)

Code
ggplot(subset(happy, happy$Continent != "Australia"), aes(x = Life.Expectancy, y = Happiness.Score)) + 
  geom_point(aes(color=Continent), size = 3, alpha = 0.8) +  
  geom_smooth(aes(color = Continent, fill = Continent), 
              method = "lm", fullrange = TRUE) +
  facet_wrap(~Continent) +
  theme_bw() + labs(title = "Scatter plot with regression line (Happiness Score & Life Expectancy)")
Error in happy$Continent: $ operator is invalid for atomic vectors

The correlation between life expectancy and happiness score in Europe, North America, and Asia is more significant than the other continents. Worth mentioning that we will not take Australia into account because there are just two countries in Australia and creating scatter plot with the regression line for this continent will not give us any insight.

Code
ggplot(subset(happy, happy$Continent != "Australia"), aes(x = Economy, y = Happiness.Score)) + 
  geom_point(aes(color=Continent), size = 3, alpha = 0.8) +  
  geom_smooth(aes(color = Continent, fill = Continent), 
              method = "lm", fullrange = TRUE) +
  facet_wrap(~Continent) +
  theme_bw() + labs(title = "Scatter plot with regression line (Happiness Score & Economy)")
Error in happy$Continent: $ operator is invalid for atomic vectors

We can see pretty the same result here for the correlation between happiness score and economy. Africa has the lowest relationship in this regard.

Code
ggplot(subset(happy, happy$Continent != "Australia"), aes(x = Freedom, y = Happiness.Score)) + 
  geom_point(aes(color=Continent), size = 3, alpha = 0.8) +  
  geom_smooth(aes(color = Continent, fill = Continent), 
              method = "lm", fullrange = TRUE) +
  facet_wrap(~Continent) +
  theme_bw() + labs(title = "Scatter plot with regression line (Happiness Score & Freedom)")
Error in happy$Continent: $ operator is invalid for atomic vectors

Freedom in Europe and North America is more correlated to happiness score than any other continents.

Code
ggplot(subset(happy, happy$Continent != "Australia"), aes(x = Trust, y = Happiness.Score)) + 
  geom_point(aes(color=Continent), size = 3, alpha = 0.8) +  
  geom_smooth(aes(color = Continent, fill = Continent), 
              method = "lm", fullrange = TRUE) +
  facet_wrap(~Continent) +
  theme_bw() + labs(title = "Scatter plot with regression line (Happiness Score & Trust)")
Error in happy$Continent: $ operator is invalid for atomic vectors

Approximately there is no correlation between trust and happiness score in Africa.

Code
ggplot(subset(happy, happy$Continent != "Australia"), aes(x = Generosity, y = Happiness.Score)) + 
  geom_point(aes(color=Continent), size = 3, alpha = 0.8) +  
  geom_smooth(aes(color = Continent, fill = Continent), 
              method = "lm", fullrange = TRUE) +
  facet_wrap(~Continent) +
  theme_bw() + labs(title = "Scatter plot with regression line (Happiness Score & Generosity)")
Error in happy$Continent: $ operator is invalid for atomic vectors

The regression line has a positive slope only for Europe and South America. For Asia the line is horizontal, and for Africa and North America the slope is negative.

Code
ggplot(subset(happy, happy$Continent != "Australia"), aes(x = Family, y = Happiness.Score)) + 
  geom_point(aes(color=Continent), size = 3, alpha = 0.8) +  
  geom_smooth(aes(color = Continent, fill = Continent), 
              method = "lm", fullrange = TRUE) +
  facet_wrap(~Continent) +
  theme_bw() + labs(title = "Scatter plot with regression line (Happiness Score & Family)")
Error in happy$Continent: $ operator is invalid for atomic vectors

In South America with increase in the family score, the happiness score remains constant.

Code
ggplot(subset(happy, happy$Continent != "Australia"), aes(x = Dystopia.Residual, y = Happiness.Score)) + 
  geom_point(aes(color=Continent), size = 3, alpha = 0.8) +  
  geom_smooth(aes(color = Continent, fill = Continent), 
              method = "lm", fullrange = TRUE) +
  facet_wrap(~Continent) +
  theme_bw() + labs(title = "Scatter plot with regression line (Happiness Score & Dystopia Residual)")
Error in happy$Continent: $ operator is invalid for atomic vectors

All continents act pretty the same regarding dystopia residual.

Prediction using Regression Models

In this section, we will implement several machine learning algorithms to predict happiness score. First, we should split our dataset into training and test set. Our dependent variable is happiness score, and the independent variables are family, economy, life expectancy, trust, freedom, generosity, and dystopia residual.

Code
# Splitting the dataset into the Training set and Test set
set.seed(123)
dataset <- happy[4:11]
Error in `[.data.frame`(happy, 4:11): undefined columns selected
Code
split = sample.split(dataset$Happiness.Score, SplitRatio = 0.8)
Error in sample.split(dataset$Happiness.Score, SplitRatio = 0.8): object 'dataset' not found
Code
training_set = subset(dataset, split == TRUE)
Error in subset(dataset, split == TRUE): object 'dataset' not found
Code
test_set = subset(dataset, split == FALSE)
Error in subset(dataset, split == FALSE): object 'dataset' not found
Code
# Fitting Multiple Linear Regression to the Training set
regressor_lm = lm(formula = Happiness.Score ~ .,
               data = training_set)
Error in is.data.frame(data): object 'training_set' not found
Code
summary(regressor_lm)
Error in summary(regressor_lm): object 'regressor_lm' not found

The summary shows that all independent variables have a significant impact, and adjusted R squared is 1! As we discussed, it is clear that there is a linear correlation between dependent and independent variables. Again, I should mention that the sum of the independent variables is equal to the dependent variable which is the happiness score. This is the justification for having an adjusted R squared equals to 1. As a result, I guess Multiple Linear Regression will predict happiness scores with 100 % accuracy!

Code
####### Predicting the Test set results
y_pred_lm = predict(regressor_lm, newdata = test_set)
Error in predict(regressor_lm, newdata = test_set): object 'regressor_lm' not found
Code
Pred_Actual_lm <- as.data.frame(cbind(Prediction = y_pred_lm, Actual = test_set$Happiness.Score))
Error in cbind(Prediction = y_pred_lm, Actual = test_set$Happiness.Score): object 'y_pred_lm' not found
Code
gg.lm <- ggplot(Pred_Actual_lm, aes(Actual, Prediction )) +
  geom_point() + theme_bw() + geom_abline() +
  labs(title = "Multiple Linear Regression", x = "Actual happiness score",
       y = "Predicted happiness score") +
  theme(plot.title = element_text(family = "Helvetica", face = "bold", size = (15)), 
        axis.title = element_text(family = "Helvetica", size = (10)))
Error in ggplot(Pred_Actual_lm, aes(Actual, Prediction)): object 'Pred_Actual_lm' not found
Code
gg.lm
Error in eval(expr, envir, enclos): object 'gg.lm' not found

As we expected, actual versus predicted plot shows the accuracy of our model.

Code
# Fitting Multiple Linear Regression to the Training set
regressor_lm_2 = lm(formula = happy$Happiness.Score ~ happy$Trust+happy$Life.Expectancy+ happy$Generosity + happy$Economy+ happy$Family + happy$Freedom+happy$Dystopia.Residual,
               data = training_set)
Error in is.data.frame(data): object 'training_set' not found
Code
summary(regressor_lm_2)
Error in summary(regressor_lm_2): object 'regressor_lm_2' not found
Code
####### Predicting the Test set results
y_pred_lm = predict(regressor_lm_2, newdata = test_set)
Error in predict(regressor_lm_2, newdata = test_set): object 'regressor_lm_2' not found
Code
Pred_Actual_lm <- as.data.frame(cbind(Prediction = y_pred_lm, Actual = test_set$Happiness.Score))
Error in cbind(Prediction = y_pred_lm, Actual = test_set$Happiness.Score): object 'y_pred_lm' not found
Code
gg.lm <- ggplot(Pred_Actual_lm, aes(Actual, Prediction )) +
  geom_point() + theme_bw() + geom_abline() +
  labs(title = "Multiple Linear Regression", x = "Actual happiness score",
       y = "Predicted happiness score") +
  theme(plot.title = element_text(family = "Helvetica", face = "bold", size = (15)), 
        axis.title = element_text(family = "Helvetica", size = (10)))
Error in ggplot(Pred_Actual_lm, aes(Actual, Prediction)): object 'Pred_Actual_lm' not found
Code
gg.lm
Error in eval(expr, envir, enclos): object 'gg.lm' not found